Image Mining of Historical Manuscripts to Establish Provenance
نویسندگان
چکیده
he recent digitization of more than twenty million books has been led by initiatives from countries wishing to preserve their cultural heritage and by commercial endeavors, such as the Google Print Library Project. Within a few years a significant fraction of the world’s books will be online. For millions of intact books and tens of millions of loose pages, the provenance of the manuscripts may be in doubt or completely unknown, thus denying historians an understanding of the context of the content. In some cases it may be possible for human experts to regain the provenance by examining linguistic, cultural and/or stylistic clues. However, such experts are rare and this investigation is clearly a time-consuming process. One technique used by experts to establish provenance is the examination of the ornate initial letters appearing in the questioned manuscript. By comparing the initial letters in the manuscript to annotated initial letters whose origin is known, the provenance can be determined. In this work we show for the first time that we can reproduce this ability with a computer algorithm. We leverage off a recently introduced technique to measure texture similarity and show that it can recognize initial letters with an accuracy that rivals or exceeds human performance. A brute force implementation of this measure would require several years to process a single large book; however, we introduce a novel lower bound that allows us to process the books in minutes.
منابع مشابه
Provenance as Data Mining: Combining File System Metadata with Content Analysis
Provenance describes how an object came to be in its present state. Thus, it describes the evolution of the object over time. Prior work on provenance has focussed on databases and the file system. The database or file system is enhanced or augmented in order to capture additional information about the historical evolution of document collections, and thus answer the provenance question. We add...
متن کاملThe Archival Principle of Provenance and Its Application to Image Content Management Systems
Variously described as a " powerful guiding principle " (Dearstyne, 1993), and " the only principle " of archival theory (Horsman, 1994), the Principle of Provenance distinguishes the archival profession from other information professions in its focus on a document's context, use and meaning. This Principle, generally concerned with the origin of records, has three distinct meanings (Bellardo &...
متن کاملMargins are more important than text, Historical values of margins, memorial notes and colophons of Manuscripts in Zoroastrian tradition
In the Zoroastrian tradition, the most important challenge and the most ambiguous issue is ambiguity in history and neglect of time and chronology. Perhaps, this approach that historical time is limit and the begging and end of time is clear and the goodness will be conqueror eventually; it is because of ambiguity of history in Zoroastrian tradition.since early time to now, the Zoroastrian re...
متن کاملProvenance, Tectonic Setting & Geochemical Maturity of The Early Miocene Pyawbwe Formation, Sakangyi –Thayet Area, Magway Region, Myanmar.
Abstract The best exposed Early Miocene (820 m. thick. ) shales and interbedded silty sandstones beds of the Pyawbwe Formation at Sakangyi- Thayat area,Magway Region are investigated geochemically by using Siemens SRS- X Ray 303 AS XRF Spectrometer. Major and some trace element concentrations have been determined to achieve their provenance, tectonic setting ,paleoweathering , paleoclimate and ...
متن کاملProvenance for Data Mining
Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates this reduction in size, the loss of information it entails can be problematic. Specifically, the results of data mining may be more confusing than...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012